由于规模和形状的极端复杂性以及预测位置的不确定性,光学遥感图像(RSI-SOD)中的显着对象检测是一项非常困难的任务。现有的SOD方法可以满足自然场景图像的检测性能,但是由于遥感图像中上述图像特性,它们不能很好地适应RSI-SOD。在本文中,我们为光学RSIS中的SOD提出了一个新颖的注意力指导网络(AGNET),包括位置增强阶段和细节细节阶段。具体而言,位置增强阶段由语义注意模块和上下文注意模块组成,以准确描述显着对象的大致位置。细节完善阶段使用提出的自我注册模块在注意力的指导下逐步完善预测结果并逆转注意力。此外,混合损失用于监督网络的培训,这可以从像素,区域和统计数据的三个角度来改善模型的性能。在两个流行的基准上进行的广泛实验表明,与其他最先进的方法相比,AGNET可以达到竞争性能。该代码将在https://github.com/nuaayh/agnet上找到。
translated by 谷歌翻译
In the process of projecting the surface of a three-dimensional object onto a two-dimensional surface, due to the perspective distortion, the image on the surface of the object will have different degrees of distortion according to the level of the surface curvature. This paper presents an imprecise method for flattening this type of distortion on the surface of a regularly curved body. The main idea of this method is to roughly estimate the gridded surface subdivision that can be used to describe the surface of the three-dimensional object through the contour curve of the two-dimensional image of the object. Then, take each grid block with different sizes and shapes inversely transformed into a rectangle with exactly the same shape and size. Finally, each of the same rectangles is splicing and recombining in turn to obtain a roughly flat rectangle. This paper will introduce and show the specific process and results of using this method to solve the problem of bending page flattening, then demonstrate the feasibility and limitations of this method.
translated by 谷歌翻译
Diabetic retinopathy (DR) is a complication of diabetes, and one of the major causes of vision impairment in the global population. As the early-stage manifestation of DR is usually very mild and hard to detect, an accurate diagnosis via eye-screening is clinically important to prevent vision loss at later stages. In this work, we propose an ensemble method to automatically grade DR using ultra-wide optical coherence tomography angiography (UW-OCTA) images available from Diabetic Retinopathy Analysis Challenge (DRAC) 2022. First, we adopt the state-of-the-art classification networks, i.e., ResNet, DenseNet, EfficientNet, and VGG, and train them to grade UW-OCTA images with different splits of the available dataset. Ultimately, we obtain 25 models, of which, the top 16 models are selected and ensembled to generate the final predictions. During the training process, we also investigate the multi-task learning strategy, and add an auxiliary classification task, the Image Quality Assessment, to improve the model performance. Our final ensemble model achieved a quadratic weighted kappa (QWK) of 0.9346 and an Area Under Curve (AUC) of 0.9766 on the internal testing dataset, and the QWK of 0.839 and the AUC of 0.8978 on the DRAC challenge testing dataset.
translated by 谷歌翻译
The orthogonality constraints, including the hard and soft ones, have been used to normalize the weight matrices of Deep Neural Network (DNN) models, especially the Convolutional Neural Network (CNN) and Vision Transformer (ViT), to reduce model parameter redundancy and improve training stability. However, the robustness to noisy data of these models with constraints is not always satisfactory. In this work, we propose a novel two-stage approximately orthogonal training framework (TAOTF) to find a trade-off between the orthogonal solution space and the main task solution space to solve this problem in noisy data scenarios. In the first stage, we propose a novel algorithm called polar decomposition-based orthogonal initialization (PDOI) to find a good initialization for the orthogonal optimization. In the second stage, unlike other existing methods, we apply soft orthogonal constraints for all layers of DNN model. We evaluate the proposed model-agnostic framework both on the natural image and medical image datasets, which show that our method achieves stable and superior performances to existing methods.
translated by 谷歌翻译
The rapid development of aspect-based sentiment analysis (ABSA) within recent decades shows great potential for real-world society. The current ABSA works, however, are mostly limited to the scenario of a single text piece, leaving the study in dialogue contexts unexplored. In this work, we introduce a novel task of conversational aspect-based sentiment quadruple analysis, namely DiaASQ, aiming to detect the sentiment quadruple of target-aspect-opinion-sentiment in a dialogue. DiaASQ bridges the gap between fine-grained sentiment analysis and conversational opinion mining. We manually construct a large-scale, high-quality Chinese dataset and also obtain the English version dataset via manual translation. We deliberately propose a neural model to benchmark the task. It advances in effectively performing end-to-end quadruple prediction and manages to incorporate rich dialogue-specific and discourse feature representations for better cross-utterance quadruple extraction. We finally point out several potential future works to facilitate the follow-up research of this new task. The DiaASQ data is open at https://github.com/unikcc/DiaASQ
translated by 谷歌翻译
Cross-view geo-localization aims to spot images of the same location shot from two platforms, e.g., the drone platform and the satellite platform. Existing methods usually focus on optimizing the distance between one embedding with others in the feature space, while neglecting the redundancy of the embedding itself. In this paper, we argue that the low redundancy is also of importance, which motivates the model to mine more diverse patterns. To verify this point, we introduce a simple yet effective regularization, i.e., Dynamic Weighted Decorrelation Regularization (DWDR), to explicitly encourage networks to learn independent embedding channels. As the name implies, DWDR regresses the embedding correlation coefficient matrix to a sparse matrix, i.e., the identity matrix, with dynamic weights. The dynamic weights are applied to focus on still correlated channels during training. Besides, we propose a cross-view symmetric sampling strategy, which keeps the example balance between different platforms. Albeit simple, the proposed method has achieved competitive results on three large-scale benchmarks, i.e., University-1652, CVUSA and CVACT. Moreover, under the harsh circumstance, e.g., the extremely short feature of 64 dimensions, the proposed method surpasses the baseline model by a clear margin.
translated by 谷歌翻译
We study discrete distribution estimation under user-level local differential privacy (LDP). In user-level $\varepsilon$-LDP, each user has $m\ge1$ samples and the privacy of all $m$ samples must be preserved simultaneously. We resolve the following dilemma: While on the one hand having more samples per user should provide more information about the underlying distribution, on the other hand, guaranteeing the privacy of all $m$ samples should make the estimation task more difficult. We obtain tight bounds for this problem under almost all parameter regimes. Perhaps surprisingly, we show that in suitable parameter regimes, having $m$ samples per user is equivalent to having $m$ times more users, each with only one sample. Our results demonstrate interesting phase transitions for $m$ and the privacy parameter $\varepsilon$ in the estimation risk. Finally, connecting with recent results on shuffled DP, we show that combined with random shuffling, our algorithm leads to optimal error guarantees (up to logarithmic factors) under the central model of user-level DP in certain parameter regimes. We provide several simulations to verify our theoretical findings.
translated by 谷歌翻译
本文提出了一种使用信息理论成本来学习有效地标本地化和探索的连续控制政策的方法。我们考虑一个移动机器人在有限的传感范围内检测地标,并解决学习控制政策的问题,该控制政策最大程度地提高了地标状态与传感器观察之间的相互信息。我们采用Kalman过滤器将地标州的部分可观察到的问题转换为马尔可夫决策过程(MDP),这是一个可区分的视野来塑造奖励,以及基于注意力的神经网络来代表控制策略。除了具有里程碑意义的定位外,该方法通过主动容积映射进一步统一,以促进勘探。与基准方法相比,在几个模拟地标本地化任务中证明了该性能。
translated by 谷歌翻译
事件摄像机是运动激活的传感器,可捕获像素级照明的变化,而不是具有固定帧速率的强度图像。与标准摄像机相比,它可以在高速运动和高动态范围场景中提供可靠的视觉感知。但是,当相机和场景之间的相对运动受到限制时,例如在静态状态下,事件摄像机仅输出一点信息甚至噪音。尽管标准相机可以在大多数情况下,尤其是在良好的照明条件下提供丰富的感知信息。这两个相机完全是互补的。在本文中,我们提出了一种具有鲁棒性,高智能和实时优化的基于事件的视觉惯性镜(VIO)方法,具有事件角度,基于线的事件功能和基于点的图像功能。提出的方法旨在利用人为场景中的自然场景和基于线路的功能中的基于点的功能,以通过设计良好设计的功能管理提供更多其他结构或约束信息。公共基准数据集中的实验表明,与基于图像或基于事件的VIO相比,我们的方法可以实现卓越的性能。最后,我们使用我们的方法演示了机上闭环自动驾驶四极管飞行和大规模室外实验。评估的视频在我们的项目网站上介绍:https://b23.tv/oe3qm6j
translated by 谷歌翻译
越来越多的自然语言处理研究(NLP)和自然语言理解(NLU)正在研究从大语言模型的嵌入一词中学习或编码的人类知识。这是了解哪些知识语言模型捕获的一步,类似于人类对语言和交流的理解。在这里,我们调查了单词(即价,唤醒,主导地位)的影响以及如何在大型神经网络中预先训练的单词嵌入中编码。我们将人类标记的数据集用作地面真理,并对四种单词嵌入方式进行了各种相关和分类测试。嵌入在静态或上下文化方面有所不同,以及在训练和微调阶段优先考虑特定信息的程度。我们的分析表明,嵌入Vanilla Bert模型的单词并未明显编码英语单词的影响信息。只有在与情绪相关的任务上进行微调或包含来自情感丰富的环境的额外上下文化信息时,只有在bert模型进行微调时,相应的嵌入方式可以编码更相关的影响信息。
translated by 谷歌翻译